AITopics | count-min sketch

A Bayesian Nonparametric View on Count-Min Sketch

Neural Information Processing SystemsMar-16-2026, 17:58:47 GMT

The count-min sketch is a time-and memory-efficient randomized data structure that provides a point estimate of the number of times an item has appeared in a data stream. The count-min sketch and related hash-based data structures are ubiquitous in systems that must track frequencies of data such as URLs, IP addresses, and language n-grams. We present a Bayesian view on the count-min sketch, using the same data structure, but providing a posterior distribution over the frequencies that characterizes the uncertainty arising from the hash-based approximation. In particular, we take a nonparametric approach and consider tokens generated from a Dirichlet process (DP) random measure, which allows for an unbounded number of unique tokens. Using properties of the DP, we show that it is possible to straightforwardly compute posterior marginals of the unknown true counts and that the modes of these marginals recover the count-min sketch estimator, inheriting the associated probabilistic guarantees. Using simulated data with known ground truth, we investigate the properties of these estimators. Lastly, we also study a modified problem in which the observation stream consists of collections of tokens (i.e., documents) arising from a random measure drawn from a stable beta process, which allows for power law scaling behavior in the number of unique tokens.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

Tharun Kumar Reddy Medini, Qixuan Huang, Yiqiu Wang, Vijai Mohan, Anshumali Shrivastava

Neural Information Processing SystemsFeb-12-2026, 11:37:36 GMT

Neural Information Processing Systems http://nips.cc/

classifier, dataset, mach, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

Neural Information Processing SystemsDec-25-2025, 12:28:02 GMT

In the last decade, it has been shown that many hard AI tasks, especially in NLP, can be naturally modeled as extreme classification problems leading to improved precision. However, such models are prohibitively expensive to train due to the memory bottleneck in the last layer. For example, a reasonable softmax layer for the dataset of interest in this paper can easily reach well beyond 100 billion parameters (> 400 GB memory). To alleviate this problem, we present Merged-Average Classifiers via Hashing (MACH), a generic $K$-classification algorithm where memory provably scales at $O(\log K)$ without any assumption on the relation between classes. MACH is subtly a count-min sketch structure in disguise, which uses universal hashing to reduce classification with a large number of classes to few embarrassingly parallel and independent classification tasks with a small (constant) number of classes.

amazon search, count-min sketch, extreme classification, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Bayesian Nonparametric View on Count-Min Sketch

Neural Information Processing SystemsNov-20-2025, 21:44:09 GMT

The count-min sketch is a time-and memory-efficient randomized data structure that provides a point estimate of the number of times an item has appeared in a data stream. The count-min sketch and related hash-based data structures are ubiquitous in systems that must track frequencies of data such as URLs, IP addresses, and language n-grams. We present a Bayesian view on the count-min sketch, using the same data structure, but providing a posterior distribution over the frequencies that characterizes the uncertainty arising from the hash-based approximation. In particular, we take a nonparametric approach and consider tokens generated from a Dirichlet process (DP) random measure, which allows for an unbounded number of unique tokens. Using properties of the DP, we show that it is possible to straightforwardly compute posterior marginals of the unknown true counts and that the modes of these marginals recover the count-min sketch estimator, inheriting the associated probabilistic guarantees. Using simulated data with known ground truth, we investigate the properties of these estimators. Lastly, we also study a modified problem in which the observation stream consists of collections of tokens (i.e., documents) arising from a random measure drawn from a stable beta process, which allows for power law scaling behavior in the number of unique tokens.

bayesian nonparametric view, count-min sketch, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

A Bayesian Nonparametric View on Count-Min Sketch

Diana Cai, Michael Mitzenmacher, Ryan P. Adams

Neural Information Processing SystemsNov-20-2025, 14:18:54 GMT

Using simulated data and text data, we investigate the properties of these estimators. Lastly, we also study a modified problem in which the observation stream consists of collections of tokens (i.e., documents) arising from a random measure drawn from a stable beta process,

machine learning, natural language, sketch, (20 more...)

Neural Information Processing Systems

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Architecture (0.68)
(2 more...)

Add feedback

Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

Tharun Kumar Reddy Medini, Qixuan Huang, Yiqiu Wang, Vijai Mohan, Anshumali Shrivastava

Neural Information Processing SystemsOct-2-2025, 22:18:24 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Adaptive-GraphSketch: Real-Time Edge Anomaly Detection via Multi-Layer Tensor Sketching and Temporal Decay

Ekle, Ocheme Anthony, Eberle, William

arXiv.org Artificial IntelligenceSep-16-2025

Anomaly detection in dynamic graphs is essential for identifying malicious activities, fraud, and unexpected behaviors in real-world systems such as cybersecurity and power grids. However, existing approaches struggle with scalability, probabilistic interpretability, and adaptability to evolving traffic patterns. In this paper, we propose ADAPTIVE-GRAPHSKETCH, a lightweight and scalable framework for real-time anomaly detection in streaming edge data. Our method integrates temporal multi-tensor sketching with Count-Min Sketch using Conservative Update (CMS-CU) to compactly track edge frequency patterns with bounded memory, while mitigating hash collision issues. We incorporate Bayesian inference for probabilistic anomaly scoring and apply Exponentially Weighted Moving Average (EWMA) for adaptive thresholding tuned to burst intensity. Extensive experiments on four real-world intrusion detection datasets demonstrate that ADAPTIVE-GRAPHSKETCH outperforms state-of-the-art baselines such as ANOEDGE-G/L, MIDAS-R, and F-FADE, achieving up to 6.5% AUC gain on CIC-IDS2018 and up to 15.6% on CIC-DDoS2019, while processing 20 million edges in under 3.4 seconds using only 10 hash functions. Our results show that ADAPTIVE-GRAPHSKETCH is practical and effective for fast, accurate anomaly detection in large-scale streaming graphs. Keywords: Anomaly Detection, Streaming, Real-time, Dynamic Graphs, Edge Streams, Tensor Sketching

data mining, detection, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.11633

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.34)

Add feedback

Faithful Embeddings for Knowledge Base Queries: Supplementary Material Haitian Sun Andrew O. Arnold

Neural Information Processing SystemsAug-17-2025, 11:15:23 GMT

A count-min sketch, as used here, is an approximation of a vector representation of a weighted set.

count-min sketch, emql, sketch, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada (0.04)

Industry:

Media > Film (0.47)
Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.40)

Add feedback

fe74074593f21197b7b7be3c08678616-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 11:15:16 GMT

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Santa Clara County > Sunnyvale (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Reviews: Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

Neural Information Processing SystemsJan-24-2025, 11:17:20 GMT

This paper studies the task of extreme classification with a large amount of target categories. It developed a hashing-based algorithm, MACH. Then a classifier is trained and applied for each hash mapping, on the reduced problem with much smaller amount of target classes. The prediction results of the sub-classifiers are then combined to re-constructed the final output. The proposed methods are demonstrated to be both efficient and effective in multiple datasets.

amazon search, count-min sketch, extreme classification, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.63)

Add feedback

Filters

Collaborating Authors

count-min sketch

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

A Bayesian Nonparametric View on Count-Min Sketch

Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

A Bayesian Nonparametric View on Count-Min Sketch

A Bayesian Nonparametric View on Count-Min Sketch

Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

Adaptive-GraphSketch: Real-Time Edge Anomaly Detection via Multi-Layer Tensor Sketching and Temporal Decay

Faithful Embeddings for Knowledge Base Queries: Supplementary Material Haitian Sun Andrew O. Arnold

fe74074593f21197b7b7be3c08678616-Paper.pdf

Reviews: Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products